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Abstract. In 2005, Chen et al introduced a sequential importance sampling (SIS) procedure 
to analyze zero-one two-way tables with given fixed marginal sums (row and column sums) 
via the conditional Poisson (CP) distribution. They showed that compared with Monte Carlo 
Markov chain (MCMC)-based approaches, their importance sampling method is more efficient 
in terms of running time and also provides an easy and accurate estimate of the total number 
of contingency tables with fixed marginal sums. In this paper we extend their result to zero- 
one multi-way (d-way, d > 2) contingency tables under the no d-way interaction model, i.e., 
with fixed d — 1 marginal sums. Also we show by simulations that the SIS procedure with 
CP distribution to estimate the number of zero-one three-way tables under the no three-way 
interaction model given marginal sums works very well even with some rejections. We also 
applied our method to Samson's monks' data set. We end with further questions on the SIS 
procedure on zero-one multi-way tables. 



1. Introduction 

Sampling zero-one constrained contingency tables finds its applications in combinatorics [7], 
statistics of social networks [21 [9], and regulatory networks [6j. In 2005, Chen et al. introduced a 
sequential importance sampling (SIS) procedure to analyze zero-one two-way tables with given 
fixed marginal sums (row and column sums) via the conditional Poisson (CP) distribution [3]. 
It proceeds by simply sampling cell entries of the zero-one contingency table sequentially for 
each row such that the final distribution approximates the target distribution. This method will 
terminate at the last column and sample independently and identically distributed (iid) tables 
from the proposal distribution. Thus the SIS procedure does not require expensive or prohibitive 
pre-computations, as is the case of computing Markov bases for the Monte Carlo Markov Chain 
(MCMC) approach. Also, when attempting to sample a single table, if there is no rejection, 
the SIS procedure is guaranteed to sample a table from the distribution, where in an MCMC 
approach the chain may require a long time to run in order to satisfy the independent condition. 

In 2007, Chen extended their SIS procedure to sample zero-one two-way tables with given 
fixed row and column sums with structural zeros, i.e., some cells are constrained to be zero or 
one [2]. In this paper we also extended the results from [3l [2] to zero-one multi-way (d-way, 
d > 2) contingency tables under the no d-way interaction model, i.e., with fixed d — 1 marginal 
sums. 

This paper is organized as follows: In Section [2] we outline basics of the SIS procedure. In 
Section [3] we focus on the SIS procedure with CP distribution on three-way tables under no 
three-way interaction model. This model is particularly important since if we are able to count 
or estimate the number of tables under this model then this is equivalent to estimating the 
number of lattice points in any polytope This means that if we can estimate the number of 
three-way zero-one tables under this model, then we can estimate the number of any zero-one 
tables by using De Loera and Onn's bijection mapping. 

Let X = (Xijk) of size (m, n, /), where m,n, I 6 N and N = {1, 2, ...,}, be a table of counts 
whose entries are independent Poisson random variables with canonical parameters Here 
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Xijk £ {0, 1}. Consider the generalized linear model, 



\ i \M i \N i \L j \MN | \ML , \ AL 
A + A, ; + A,- + A fc + A,;, + A, :fc + A jfc 



(1-1) %fc 

for j = 1, . . . , m, j = 1, . . . , n, and k = 1, . . . , I where M, N, and L denote the nominal-scale 
factors. This model is called the no three-way interaction model. 



Notice that the sufficient statistics under the model in (f.l) are the two-way marginals, that 



is: 



(1.2) 
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Hence, the conditional distribution of the table counts given the margins is the same regardless 
of the values of the parameters in the model. 

In Section [4] we generalize the SIS procedure on zero-one two-way tables in [3l [2] to zero- 
one multi-way (d-way, d > 2) contingency tables under the no d-way interaction model, i.e., 
with fixed d — 1 marginal sums. In Sections [5] and [6] we show some simulation results with our 
software which is available in http : //www . polytopes .net/code/CP[ Finally, we end with some 
discussions. 



2. Sequential importance sampling 

Let E be the set of all tables satisfying marginal conditions. In this paper we assume that 
S 7^ 0. Let -P(X) for any X £ S be the uniform distribution over S, so p(X.) = 1/|S|. Let q(-) 
be a trial distribution such that g(X) > for all X G S. Then we have 



E 



1 



k(x) 



Thus we can estimate |S| by 



y — 



N 



5(X) 



mi. 



-y— 



where Xi, . . . , Xn are tables drawn iid from <?(X). Here, this proposed distribution q(X) is the 
distribution (approximate) to sample tables via the SIS procedure. 

We vectorized the table X = (xi, • • • , xt) and by the multiplication rule we have 

q(X = (xi, ■ ■ ■ ,x t )) = q(xi)q(x 2 \xi)q(x 3 \x2, xi) ■ ■ ■ q(x t \xt-l, ■ ■ ■ ,x\). 

Since we sample each cell count of a table from an interval we can easily compute q(xi\xi—\, . . . , x{ 
for i = 2, 3, . . . , t. 

When we have rejections, this means that we are sampling tables from a bigger set S* such 
that S C £*. In this CciSG, cis long as the conditional probability c[{xi\xi—\^ . . . , x\ ) for i = 2, 3, . . . 
and q(x\) are normalized, <?(X) is normalized over X* since 

Exes* 



Eu k g(xi)g(x 2 |2;i)g(x3|x2,xi) • • • q(x t \x t -i, . . . ,xi) 

£0*00*1) [£ X2 g(xi|x 2 ) [••• [E^g^tkt-i,---,^)]]] 
1. 



Thus we have 



E 



Jixes 
9(X) 



E 

xes 



(X) 



where Ixes is an indicator function for the set E. By the law of large numbers this estimator is 
unbiased. 
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3. Sampling from the conditional Poisson distribution 

Let 

Z=(Z l ,...,Z l ) 

be independent Bernoulli trials with probability of successes p = (p\, . . . ,pi). Then the random 
variable 

s z = Z 1 + ■ ■ ■ + Z l 

is a Poisson-binomial distribution. 

We say the column of entries for the marginal Xj j 0i + of X is the (io, jo)th column of X 
(equivalently we say (io, &o)th column for the marginal X io+ k and (jo, ko)th column for the 
marginal X + j k ). Consider the (io,jo)th column of the table X for some io G {l,...,m}, 
jo G {1, . . . , n} with the marginal lo = Xi j 0+ . Also we let = X io+ k and Ck = X + j o k- Now let 
Wk = Pk/ (1 - Pk) where p k G (0, 1). Then, 

I 

(3.1) P(Z 1 = Zl ,...,Z l = zi\S z = lo) oc [I w z k ». 

k=l 

Thus for sampling a zero-one table with fixed marginals X + jk, Xi + k for i = 1, 2, . . . , m, j = 
1,2, ... ,n, and k = 1, 2, . . . , I, for Xi j 0+ for each io £ {1, . . . , m} and jo £ {1, . . . , n}, (or one 
can do each Xi 0+ k or X + j fc instead by similar way) one just decides which entries are ones 
(basically there are (^) many choices) using the conditional Poisson distribution above. We 
sample these cell entries with ones (say lo many entries with ones) in the (io, jo)th column for 
the L factor with the following probability: Let A k , for k = 1,...,Iq, be the set of selected 
entries. Thus Ao = 0, and Ai is the final sample that we obtain. At the kth. step of the drafting 
sampling (k = 1, . . . , lo), a unit j £ A c k _^ is selected into the sample with probability 

pr A c n wjRjlo-^Al^-j) 

u ' fc ~ lj (Jo-fc + iW>-fc + MLi)' 

where 

r( S ,a)= £ (n^V 

For sampling a zero-one three-way table X with given two-way marginals Xij + , Xi +k , and 
X+jk for i = 1, 2, . . . , m, j = 1, 2, . . . , n, and A; = 1, 2, we sample for the (io, jo)th column 
of the table X for each io E {1, . . . , m,}, jo G {1, . . . , n}. We set 

rk ■ c k 



(3.2) p fc : 

Thus we have 



rk ■ c k + (n- r k )(m - c k ) 
fk ■ Ck 



(3-3) w k = T u v 

(n - r k )(m - c k ) 

Remark 3.1. We assume that we do not have the trivial cases, namely, 1 < r k < n — 1 and 
1 < Cfe < m — 1. 

Theorem 3.2. For t/ie uniform distribution over all m x n x I zero-one tables with given 
marginals r k = Xi 0+ k, Ck = X + j k for k = 1,2, ... ,1, and a fixed marginal for the factor L, 
lo, the marginal distribution of the fixed marginal Iq is the same as the conditional distribution 



of Z defined by (3.1) given Sz = lo with 

._ r k ■ c k 

n ■ c k + (n - r k )(m - c k ) 
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Proof. We start by giving an algorithm for generating tables uniformly from all m x n x I zero- 
one tables with given marginals r k , c k for k = 1,2, ... ,1, and a fixed marginal for the factor L, 
lo- 

(1) For k = 1, . . . , I consider the feth layer ofmxn tables. We randomly choose r k positions 
in the (io,A;)th column and c k positions in the (jo,fe)th column, and put Is in those 
positions. The choices of positions are independent across different layers. 

(2) Accept those tables with given column sum Iq. 

It is easy to see that tables generated by this algorithm are uniformly distributed over all 
m x n x I zero-one tables with given marginals r k , c k for k = 1, 2, . . . , I, and a fixed marginal for 
the factor L, Iq for the (io, Jo)th column of the table X. We can derive the marginal distribution 
of the jo)th column of X based on this algorithm. At StepJTJ we choose the cell at position 
(io, jo, 1) to put 1 in with the probability: 

/ n— 1\ (m-l\ 

[™Z\) + CV) n • ci + (n - n)(m - d) • 

Because the choices of positions are independent across different layers, after Step [T] the marginal 
distribution of the (io,jo)th column is the same as the distribution of Z defined by ( |3.1[ ) with 

_ \r k - 1) \c k -l) _ r k • Cfc 

Pk " CT-i) G-l) + (V, 1 ) ("I 1 ) ~r k .c k + (n- r k )(m - c k ) ' 

Step [2] rejects the tables whose (io,Jo)th column sum is not Iq. This implies that after Step[2j 
the marginal distribution of the (io, jo)th column is the same as the conditional distribution of 



Z defined by (3.1) with 

rk ■ c k 

Pk = 71 u V 

rk-c k + {n- r k ){m - c k ) 

□ 

Remark 3.3. The sequential importance sampling via CP for sampling a two-way zero-one 



table defined in [3] is a special case of our SIS procedure. We can induce pk defined in (3.2) 



and the weights defined in (3.3) to the weights for two-way zero-one contingency tables defined 
in [3]. Note that when we consider two-way zero-one contingency tables we have Cfc = 1 for all 
k = 1, . . . , I and for all j'q = 1, . . . , n (or r k = 1 for all k = 1, . . . , I and for alHo = 1> • • • ; m ), an d 
m = 2 (or n = 2, respectively). Therefore when we consider the two-way zero-one tables we get 

r k r k 

Pk = — , w k = , 

n n — r k 

or respectively 

Cfc Cfc 

p k = — , w k = . 

m m — Ck 

During the intermediary steps of our SIS procedure via CP on a three-way zero-one table 
there will be some columns for the L factor with trivial cases. In that case we have to treat 
them as structural zeros in the kth slice for some k E {1, . . . , I}. In that case we have to use the 



probabilities for the distribution in (3.1) as follows: 

rk ■ c k 



(3-4) Pk '■— . 1 n , , , n, , • 

rk • Cfc + (n - r k - g k l> ){m -c k - g k ") 

where g r ,° is the number of structural zeros in the (ro,fc)th column and is the number of 
structural zeros in the (cq, k)th column. Thus we have weights: 

rk ■ Cfc 



(3.5) w k 



(n-r k - g r k °)(m - c k - g c k °) ' 
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Theorem 3.4. For the uniform distribution over all m x n x I zero-one tables with structural 
zeros with given marginals r^ = Xi 0+ k, Ck = X + j k for k = 1, 2, . . . , I, and a fixed marginal for 
the factor L, Iq, the marginal distribution of the fixed marginal Iq is the same as the conditional 
distribution of Z defined by (3.1) given Sz = Iq with 



Pk 



rk ■ Ck 



rk ■ Ck + (n - r k - g r k °){m - c k - g c k °) ' 

where g k ° is the number of structural zeros in the (rQ,k)th column and g?° is the number of 
structural zeros in the (co,k)th column. 



Proof. The proof is similar to the proof for Theorem 3.2, just replace the probability pk with 



Pk 



rk ■ Ck 



v rfc— 1 / V Cfc — 1 



+ 



fn-1- 



'■fc 



*°)( 



') r k ■ c k + (n - r k - g r k °)(m 



Ck 



□ 



Remark 3.5. The sequential importance sampling via CP for sampling a two-way zero-one 
table with structural zeros defined in Theorem 1 in [2] is a special case of our SIS. We can 



induce pk defined in (3.4) and the weights defined in (3.5) to the weights for two-way zero-one 
contingency tables defined in [2]. Note that when we consider two-way zero-one contingency 
tables we have Ck = 1 for all k = 1, . . . , / and for all jo = 1, • • • , n (or r& = 1 for all k = 1, . . . , I 
and for all io = 1, . . . , m), m = 2 (or n = 2, respectively), and g^ = (or g r k ° , respectively). 
Therefore when we consider the two-way zero-one tables we get 

rk r k 



or respectively 



Pk 



Pk 



n 



Uk 



Ck 



m 



n c » W k 

Uk 



n-r k 



Ck 



9 r r 



m-c k 



9 C k°* 



Algorithm 3.6 (Store structures in the zero-one table). This algorithm stores the structures, 
including zeros and ones, in the observed table xo. The output will be used to avoid trivial cases 
in sampling. The output A and B matrices both have the same dimension with xo, so the cell 
value in A will be 1 if the position is structured and if not. The matrix B is only for structure 
l's. We consider sampling a table without structure l's, that is, a table with new marginals: 
= Xij + — X^fc=i Bijk = Xij+ — Bij + , X* +k = Xi + k — ^j=i^ijk = Xi + k — Bi + k, arid 
X* +jk = X +jk - J27Li B ijk = X +jk - B +jk for i = 1, 2, . . . , m, j = 1, 2, . . . , n, and k = 1, 2, . . . , I. 

Input The observed marginals Xij+, X i+k , and X + j k for i = 1, 2, . . . , m, j = 1,2, ... ,n, and 
k = 1,2, 

Output Matrix A and B, new marginals X*- + , X* +k , and X^- k for i = 1, 2, . . . , m, j = 1, 2, . . . , n, 
and k = 1, 2, . . . , I. 
Algorithm (1) Check all marginals in direction I. For i = 1, 2, . . . , m: 

If X + jk = 0, Aiij k = 1, for all i' = 1, 2, . . . ,m and Ai'j k = 0; 

If X + jk = 1, Ai'j k = 1 and Bi'j k = 1, for all i' = 1, 2, . . . , m and A^j k = 0. 

(2) Check all marginals in direction J. For j = 1,2, ... ,n: 

If X i+k = 0, Aifk = 1, for all f = 1,2, . . . ,n and A^p. = 0; 

If Xi + k = 1, Aij/k = 1 and Bij'k = 1, for all f = 1,2, ... ,n and Aiji k = 0. 
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A 

K 




Figure 1. An example of a 3 x 3 x 3 table. 



(3) Check all marginals in direction K. For k = 1,2, ... ,1: 

If Xij + = 0, Aijk' = 1, for all k' = 1, 2, . . . , I and A^ = 0; 

If Xij + = 1, = 1 and Bjjfc' = 1, for all k' = 1, 2, . . . , I and Am*,/ = 0. 

(4) If any changes made in step (1), (2) or (3), come back to (1), else stop. 

(5) Compute new marginals: 

= — Bij+, X* +k = Xi + k — Bi + k, and X + j k = X + jk — B + jk for i = 

1, 2, . . . , m, j = 1, 2, . . . , n, and k = 1, 2, . . . , I. 

Algorithm 3.7 (Generate a two-way table with given marginals). This algorithm is used to 
generate a layer (fixed i) of the three-way table, with the probability of the sampled layer. 

Input Row sums r*- and column sums c|, j = 1,2, ... ,n, and k = 1,2, ... , /; structures A; 
marginals on direction I: X + j} t for i = 1, 2, . . . , m. 
Output A sampled table and its probability. Return if the process fails. 
Algorithm (1) Order all columns with decreasing sums. 

(2) Generate the column (along the direction K) with the largest sum, the weights 



used in CP are shown in equation (3.5). Notice that k relates to each specific 



cell in the column, and Ck which are the row sums in the direction J and /, 
respectively. g r k ° and are the number of structures in the rows of the direction 
J and /, respectively. The probability of the generated column will be returned if 
the process succeeds, while may be returned in this step if it does not exist. 
(3) Delete the generated column in (2), and for the remaining subtable, do the following: 
(a) If only one column is left, fill it with fixed marginals and go to (4). 
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(b) If (a) is not true, check all marginals to see if there are any new structures 
caused by step (2). We need to avoid trivial cases by doing this. Go back to 
(1) with new marginals and structures. 
(4) Return generated matrix as the new layer and its CP probability. If failed, return 
0. 

Algorithm 3.8 (SIS with CP for sampling a three-way zero-one table). We describe an algo- 
rithm to sample a three-way zero-one table X with given marginals X%j+, Xt+k, an d X + jk for 
i = 1, 2, . . . , m, j = 1, 2, . . . , n, and k = 1,2, ... ,1 via the SIS with CP. 

Input The observed table xo- 
Output The sampled table x. 
Algorithm (1) Compute the marginals Xij + , Xi +k , and X + jk for i = 1, 2, . . . , m, j = 1,2, ... ,n, 
and k = 1, 2, . . . , I. 



(2) Use Algorithm 3.6 to compute the structure tables A and B. Consider the new 
marginals in the output as the sampling marginals. 

(3) For the sampling marginals, do the SIS: 

(a) Delete the layers filled by structures; consider the left-over subtable. 

(b) Consider the layers in direction / (i varies). Sum within all layers and order 
them from the largest to smallest. 

(c) Consider the layer with the largest sum and plug in the structure table A 
from Algorithm |3.7| to generate a sample for this layer. The algorithm may 
return if the sampling fails. 

(d) Delete the generated layer in (c), and for the remaining subtable, do the 
following: 

(i) If only one layer left, fill it with fixed marginals and go to (e). 

(ii) else, go back to (2) with new marginals. 

(e) Add the sampled table with table B (the structure l's table). 

(4) Return the table in (e) and the same probability with the sampled table. Return 
if failed. 



4. Four or higher dimensional zero-one tables 

In this section we consider a d-way zero-one table under the no d-way interaction model for 
d G N and d > 3. Let X = (X^...^) be a zero-one contingency table of size {n\ x • • • x rid), where 
rii G N for i = 1, . . . , d. The sufficient statistics under the no d-way interaction model are 

(4 1 \ X+i 2 ...i d , Aj 1 +j 3 ...i d , . . . , -Xi 1 ...j d _ 1 -f , 

for i\ = 1, . . . ,n\, h = 1, • • • ,n2, ■ ■ ■ ,id = 1, • • • ,n<j. 

For each v( G {1, . . . , n±}, . . . , G {1, . . . , rid}, we say the column of the entries for a mai- 
ginal A',....,. . . ,.,...,„ the (i , . . . . . , i d )th column of X. For each i\ G {1, . . . ,rii}, . . . ,i d _ x 

{1, . . . ,rid-i}, we consider the (ij, • • • ,i^_ 1 )th column for the cfth factor. Let Iq = A^ ^0 + . 

Let r 3 k = Xjp ^o +i o k for fixed k G {1, . . . For sampling a zero-one d-way table X, 

1 j 1 j -\- 1 d 1 

we set 

/ a ON rij=i r i 

(4.2) p k ■- 



nfcM+nfci( 



no- 



Remark 4.1. We assume that we do not have trivial cases, namely, 1 < ri < rij — 1 for 
j = 1, . . . ,d. 
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Theorem 4.2. For the uniform distribution over all d-way zero-one contingency tables X = 
(-^»i...id) °f s ^ e ( n i x • • • x rid), where rii £ N for i = 1, . . . ,d with marginals Iq = jo + , 

and r 3 k = XjO ...i° d : fc / or & ^ {1; • • • > n dL ^ e marginal distribution of the fixed marginal 



lo is the same as the conditional distribution of Z defined by (3.1) given Sz = lo with 



Pk 



111=14. +n?=iVi - 



a-/ 



Proof. The proof is similar to the proof for Theorem |3.2[ we just extend the same argument to 
a d-way zero-one table under the no d-way interaction model with the probability 



ns o: ) + n?=i (T) m 4 + n - ^ 

k k 

□ 



During the intermediary steps of our SIS procedure via CP on a three-way zero-one table 
there will be some columns for the dth factor with trivial cases. In that case we have to treat 
them as structural zeros in the feth slice for some k E {1, . . . , I}. In that case we have to use the 



probabilities for the distribution in (3.1) as follows: 



(4.3) Pk :-- 



Y\j=l r { +Uj=i( n 3 ~ r k -9k) 



where gj, is the number of structural zeros in the (i^, ■ ■ • , • • • > *rf_i^)th column of X. 

Thus we have weights: 



(4.4) w k 



Y\d-l j 

llj=l 'k 

UjZlinj - r{ - g[) 



Theorem 4.3. For the uniform distribution over all d-way zero-one contingency tables X = 
(Xii...id) °f s ^ ze ( n i x • • • x rid), where raj £ N for i = 1, . . . ,d with marginals Iq = X ti o { o + , 

and r 3 k = XjO jO j° 1 fe f° r A; € {1, ... , ra^}, the marginal distribution of the fixed marginal 



Iq is the same as the conditional distribution of Z defined by (3.1) given Sz = lo with 

n d_1 r j 

' Tld— 1 j , TT<2— 1/ J i\ 

U j= i r i + Uj=i (ni -n -9k) 

where g^, is the number of structural zeros in the (i®, . . . , ij+ij ■ ■ ■ j column of X. 



Proof. The proof is similar to the proof for Theorem 3.4, we just extend the same argument to 



a d-way zero-one table under the no d-way interaction model with the probability 



LLj 

k x * ' k 

□ 



ESTIMATING THE NUMBER OF ZERO-ONE MULTI-WAY TABLES VIA SEQUENTIAL IMPORTANCE SAMPLING 



5. Computational examples 
For our simulation study we used the software package R [10] . We count the exact numbers of 



tables via the software LattE [5] for small examples in this section (Examples (5.2) to (5.13)). 

When the contingency tables are large and/or the models are complicated, it is very difficult 
to obtain the exact number of tables. Thus we need a good measurement of accuracy in the 
estimated number of tables. In [3], they used the coefficient of variation ( 



cv 



cv 



var q {p(X)/q(X.)} 
E2{p(X)/ g (X)} 



which is equal to var q {l/ q(X.)} /E^{1/ g(X)} for the problem of estimating the number of tables. 
The value of cv 2 is simply the chi-square distance between the two distributions p' and q, which 
means the smaller it is, the closer the two distributions are. In [3] they estimated cv 2 by: 



EilxliMXi) - E; v =iiMXj) /n} 2 /(n-i 



cv 



{[£f=iiA?(Xj)] /n} 



where Xi, . . . , Xpj are tables drawn iid from q(X.). When we have rejections, we compute the 
variance using only accepted tables. In this paper we also investigated relations with the exact 
numbers of tables and cv 2 when we have rejections. 

In this section, we define the three two-way marginal matrices as following: 
Suppose we have an observed table x = (xj J fc) mxnx /, i = 1, 2, . . . , m, j = 1,2, 
1,2,...,/; 

Define: si = (X +jk ) nxl , sj = {X i+k ) mxh and sk 



, n, and k 



Example 5.1 (The 3-dimension Semimagic Cube" 
with all l's inside, that is: 



(-^Qj'+)mxn- 

. Suppose si, sj, and sk are all 3 x 3 matrices 



si 



sj = sk 



1 


1 


1 


1 


1 


1 


1 


1 


1 



The real number of tables is 12. We took 114.7 seconds to run 10,000 samples in the SIS, the 
estimator is 12, acceptance rate is 100%. Actually, we found that if the acceptance rate is 100%, 
then sample size does not matter in the estimation. 



We used R to produce more examples. Examples (5.2) to (5.13) are constructed by the same 
code but with different values for parameters. We used the R package "Rlab" for the following 
code. 

seed=6; m=3; n=3; 1=4; prob=0.8; N=1000; k=200 
set . seed(seed) 

A=array (rbern(m*n*l ,prob) , c (m,n, 1) ) 
out inf o=tabinf o ( A) 
numtable (N , out inf o , k) 

Here prob is the probability of getting 1 for every Bernoulli variable, and N is the sample size 
(the total number of tables sampled, including both acceptances and rejections). Notice that 
cv 2 is defined as 



Var 
Mean 2 ' 



Example 5.2 (seed=6; m=3; n=3; 
respectively: 



1=4; prob=0.8). Suppose si, sj, and sk are as following, 



2 


2 


2 


2 


1 


GO 


2 


2 


2 


CO 


3 


2 



2 


3 


2 


2 


1 


3 


CO 


CO 


2 


2 


2 


1 
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The real number of tables is 3. An estimator is 3.00762 with cv 2 
took 13.216 seconds (in R) with a 100% acceptance rate. 



0.0708. The whole process 



Example 5.3 (seed=60; m=3; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



2 


2 


2 


1 


h- > 




1 





h- 1 


h- 1 


1 


2 


1 


1 


2 


3 



CO 


CO 


2 


h- 1 


1 





2 


2 


h- 1 


2 


2 


CO 



CO 


2 


2 


2 


1 





2 


2 


CO 


1 


1 


3 



The real number of tables is 5. An estimator is 4.991026 with cv 2 
took 17.016 seconds (in R) with a 100% acceptance rate. 



0.1335. The whole process 



Example 5.4 (seed=61; m=3; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



h- 1 


2 


2 


1 







1 


2 


h- 1 





2 


1 





1 


CO 


2 



h- 1 


2 


3 


2 




1 


2 


3 





1 


CO 


1 



CO 


h- 1 


h- 1 


CO 


1 


2 


2 


2 


2 


1 


h- ' 


h- 1 



The real number of tables is 8. An estimator is 8.04964 with cv 2 
took 16.446 seconds (in R) with a 100% acceptance rate. 



0.2389. The whole process 



Example 5.5 (seed=240; m=4; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



2 


CO 


CO 


2 




CO 


2 


1 


h- 1 


2 


CO 





4 


2 


2 


2 



2 


2 


4 


h- 1 


CO 


2 


2 


2 


2 


CO 


3 


1 


1 


CO 


1 


1 



2 


2 


3 


2 


CO 


2 


1 


CO 


CO 


2 


2 


2 


2 


1 





CO 



The real number of tables is 8. An estimator is 8.039938 with cv 2 
took 23.612 seconds (in R) with a 100% acceptance rate. 



0.2857. The whole process 



Example 5.6 (seed=1240; m=4; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



2 


CO 


2 


CO 


h- 1 


2 


CO 


2 


2 


2 


CO 


2 


3 


2 


CO 


2 



h- 1 


4 


1 


CO 


4 


2 


4 


2 


1 


2 


4 


3 


2 


1 


2 


1 



2 


2 


2 


CO 


CO 


3 


3 


CO 


CO 


2 


2 


CO 


2 


1 


2 


1 



The real number of tables is 28. An estimator is 26.89940 with cv 2 = 1.0306. The whole process 
took 29.067 seconds (in R) with a 100% acceptance rate. It converges even better for sample 
size 5000: the estimator becomes 28.0917, with cv 2 = 1.2070. 

Example 5.7 (seed=2240; m=4; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



h- 1 


2 


CO 


1 


2 


CO 


2 


CO 


2 


4 


2 


1 


2 


1 


4 


1 



2 


CO 


2 





3 


2 


CO 


2 




CO 


3 




h- 1 


2 


3 


CO 



2 


1— ' 


2 


2 


CO 


2 


3 


2 


1 


4 


2 


1 


1 


3 


2 


CO 



The real number of tables is 4. An estimator is 3.98125 with cv 2 
took 26.96 seconds (in R) with a 100% acceptance rate. 



0.0960. The whole process 
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Example 5.8 (seed=3340; m=4; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



2 


4 


1 


3 


h- > 


2 


1 


2 


(— 1 


1 





3 


4 


1 





2 



2 


1 


1 


2 


CO 


1 


1 


3 


1 


2 





2 


2 


4 





3 



CO 


1 


1 


1 


CO 


1 


2 


2 


1 


2 


1 


1 


CO 


2 


1 


CO 



The real number of tables is 2. An estimator is 2 with cv 2 
seconds (in R) with a 100% acceptance rate. 



0. The whole process took 15.214 



Example 5.9 (seed=3440; m=4; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



h- > 


CO 


1 


3 


h- 1 


1 


2 


2 


2 


CO 


1 





3 


2 


2 


3 



2 


2 


2 


2 


2 


1 


2 


1 


1 


CO 


1 


2 


2 


CO 


1 


CO 



CO 


1 


1 


CO 


1 


2 


1 


2 


2 





CO 


2 


2 


3 


1 


3 



The real number of tables is 12. An estimator is 12.04838 with cv 2 
process took 27.074 seconds (in R) with a 85.9% acceptance rate. 



0.7819733. The whole 



Example 5.10 (seed=5440; m=4; n=4; 1=4; prob=0.5). Suppose si, sj, and sk are as following, 
respectively: 



2 







1 


2 


CO 


1 


2 


3 


1 


2 


1 




CO 


2 


2 



2 


CO 


2 


1 


2 


1 


2 


3 


2 


1 





1 


2 


CO 


1 


1 



1 


2 


2 


3 


1 


1 


CO 


CO 


1 


3 








1 


2 


2 


2 



The real number of tables is 9. An estimator is 8.882672 with cv 2 = 0.7701368. The whole 
process took 30.171 seconds (in R) with a 100% acceptance rate. Another result for the same 
sample size is: an estimator is 8.521734, cv 2 = 0.6695902. You can find that the latter has a 
slightly better cv 2 but a slightly worse estimator. We'll discuss more in Section [Tj 

Example 5.11 (seed=122; m=4; n=4; 1=5; prob=0.2). Suppose si, sj, and sk are as following, 
respectively: 



2 





CO 


CO 


2 








1 








1 





1 


1 


1 





1 





1 














2 


1 


h- 1 





2 


1 


1 


1 


1 


1 


1 


1 








2 


1 






3 








1 


4 


1 








1 





3 


1 


2 





1 






The real number of tables is 5. An estimator is 4.93625 with cv 2 
took 21.325 seconds (in R) with a 100% acceptance rate. 



0.2035. The whole process 



Example 5.12 (seed=222; m=4; n=4; 1=5; prob=0.2). Suppose si, sj, and sk are as following, 
respectively: 



h- 1 





h- 1 


1 


1 


2 


1 





1 


2 





1— ' 


1 


1 







(— 1 


1 


1 


1 



2 


1 








2 


1 


2 


1 


2 


1 


1 





1 


1 


1 








1 


1 






2 


CO 








1 


3 


2 


1 








1 


3 


1 








1 



The real number of tables is 2. An estimator is 2 with cv 2 
seconds (in R) with a 100% acceptance rate. 



0. The whole process took 19.064 
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Example 5.13 (seed=322; m=4; n=4; 1=5; prob=0.2). Suppose si, sj, and sk are as following, 
respectively: 



(— ' 


(— 1 


1 


1 


1 


(— ' 


h- 1 


1 


1 


1 


h- 1 


2 








1 


2 





1 


1 


2 









1 


1 





1 





1 





1 


2 


2 





1 


2 


2 


2 


1 


1 


2 






2 








1 








2 


1 


3 


1 


2 


3 





3 


2 



The real number of tables is 5. An estimator is 4.992 with cv 2 
took 23.25 seconds (in R) with a 85.2% acceptance rate. 



0.2179682. The whole process 



Summary 5.14 (Summarize the results from Example ( |5.2[ ) to Example (5.13)). This is only 
a summary of main results of those examples in Table [TJ For all results here, we set the sample 
size 1,000. We will discuss these results in Section [7j 



Dimension 


Example 


# tables 


Estimation 


cv 2 


Acceptance rate 


3x3x4 


5.2 


3 


3.00762 


0.0708 


100% 


3x4x4 


5.3 


5 


4.991026 


0.1335 


100% 




5.4 


8 


8.04964 


0.2389 


100% 


4x4x4 


5.5 


8 


8.039938 


0.2857 


100% 




5.6 


28 


26.89940 


1.0306 


100% 




5.7 


4 


3.98125 


0.0960 


100% 




5.8 


2 


2 





100% 




5.9 


12 


12.04838 


0.7820 


85.9% 




5.10 


9 


8.882672 


0.7701 


100% 


4x4x5 


5.11 


5 


4.93625 


0.2035 


100% 




5.12 


2 


2 





100% 




5.13 


5 


4.992 


0.2180 


85.2% 



Table 1. Summary of Examples (5.2) - (5.13) 



Example 5.15 (High-dimension Semimagic Cubes). In this example, we consider m x n x I 
tables for m = n = I = 4, . . . , 10 such that each marginal sum equals to 1. The results are 
summarized in Table [2j 

Example 5.16 (High-dimension Semimagic Cubes continues). In this example, we consider 
m x n x / tables for m = n = I = 4, . . . , 10 such that each marginal sum equals to s. The results 
are summarized in Table [3j In this example, we set the sample size N = 1000. 

Example 5.17 (Bootstrap-t confidence interval of Semimagic Cubes). As we can see that in 
Table [3J generally we have larger cv 2 when the number of tables is larger, and in this case, the 
estimator we get via the SIS procedure might vary greatly in different iterations. Therefore, we 
might want to compute a (1 — q)100% confidence interval for each estimator via a non-parametric 
bootstrap method (see Appendix [A] for a pseudo code for a non-parametric bootstrap method 
to get the (1 — a)100% confidence interval for |S|). See Table [4]for some results of Bootstrap-t 
95% confidence intervals (a = 0.05). 
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Dimension m N 


CPU time (sec) 


Estimation 


cv 2 Acceptance rate 


4 


1000 


32.44 


568.944 


0.26 


100% 




1 0000 


324.18 


571.1472 


97 


100% 

J- \J\J / (J 


5 


1000 


60.39 


161603.5 


0.18 


99% 




1 0000 


605.45 


161439.3 


18 


99 9% 


6 


1000 


102.66 


801634023 


0.58 


98.3% 




1 0000 


1038.46 


819177227 


45 


98 8% 


7 


1000 


158.55 


6.08928e+13 


0.60 


97% 




10000 


1590.84 


6.146227e+13 


0.64 


97.7% 


8 


1000 


234.53 


1.080208e+20 


1.07 


95.6% 




10000 


2300.91 


1.099627e+20 


1.00 


96.5% 


9 


1000 


329.17 


5.845308e+27 


1.46 


94% 




10000 


3238.1 


5.684428e+27 


1.59 


95.3% 


10 


1000 


451.24 


9.648942e+36 


1.44 


93.3% 




10000 


4425.12 


9.73486e+36 


1.73 


93.3% 


Table 2. 


Summary of computational results on m x n 


x I tables for 


m = n = 


1 = 4,..., 


10. The all marginal sums are 


equal to one in 


this example. 





Dimension m 


s 


CPU time (sec) 


Estimation 


cv 2 


Acceptance rate 


4 


2 


27.1 


51810.36 


0.66 


97.7% 


5 


2 


58.1 


25196288574 


1.69 


97.5% 


6 


2 


97.1 


6.339628e+18 


2.56 


94.8% 




3 


99.3 


1.269398e+22 


2.83 


96.5% 


7 


2 


150.85 


1.437412e+30 


4.76 


93.1% 




3 


166.68 


2.365389e+38 


25.33 


96.7% 


8 


2 


229.85 


5.369437e+44 


6.68 


89.8% 




3 


256.70 


3.236556e+59 


7.05 


94.5% 




4 


328.52 


2.448923e+64 


11.98 


94.3% 


9 


2 


319.32 


4.416787e+62 


8.93 


85.7% 




3 


376.67 


7.871387e+85 


15.23 


91.6% 




4 


549.73 


2.422237e+97 


14.00 


93.4% 


10 


2 


429.19 


2.166449e+84 


10.46 


83.3% 




3 


527.14 


6.861123e+117 


26.62 


90% 




4 


883.34 


3.652694e+137 


33.33 


93.8% 




5 


1439.50 


1.315069e+144 


46.2 


91.3% 



Table 3. Summary of computational results on m x n x I tables for m = n = 
I = 4, . . . , 10. The all marginal sums are equal to s in this example. The sample 
N = 1000 in this example. 



6. Experiment with Sampson's data set 

Sampson recorded the social interactions among a group of monks when he was visiting there 
as an experimenter on vision. He collected numerous sociometric rankings [HE]. The data is 
organized as a 18 x 18 x 10 table and one can find the full data sets at |http : / / vla do . f mf . 
uni-lj . si/pub/networks/data/ucinet/UciData.htm#sampson, Each layer of 18 x 18 table 
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Estimation 


cv 2 




Dim 


s 




Lower 95% 


Upper 95% 


cv 2 


Lower 95% 


Upper 95% 


Acceptance Rate 


7 
< 


o 
z 


1.306480e+30 


1.156686e+30 


1.468754e+30 


3.442306 


2.678507 


4.199513 


yo.o/o 




Q 

o 


3.033551e+38 


2.245910e+38 


4.087225e+38 


22.84399 


8.651207 


35.080408 


yo.z 7o 


o 

O 


2 


5.010225e+44 


4.200752c+44 


5.902405e+44 


6.712335 


4.539368 


8.590578 


nn a 07 




o 
O 


2.902294e+59 


2.389625e+59 


3.484405e+59 


9.047914 


5.680128 


12.797488 


ClQ 1 07 

93.1% 




A 


2.474874e+64 


1.847911e+64 


3.295986e+64 


21.53559 


5.384647 


32.166086 


y 4 . D 70 


9 


2 


4.548401e+62 


3.682882e+62 


5.593370e+62 


10.07973 


4.886817 


15.406899 


87.1% 




3 


9.702672e+85 


7.189849e+85 


1.250875e+86 


18.65302 


11.33462 


23.77980 


92.5% 




4 


2.023034e+97 


1.547951e+97 


2.561084e+97 


14.96126 


10.20331 


19.09515 


92.2% 


10 


2 


2.570344e+84 


1.908609e+84 


3.339243e+84 


17.83684 


9.785778 


24.231544 


84.8% 




3 


8.68783e+117 


5.92233e+117 


1.22271e+118 


29.67200 


18.64549 


37.64892 


90.2% 




4 


4.12634e+137 


2.94789e+137 


5.52727e+137 


23.36831 


15.32719 


31.02614 


92% 




5 


1.54956e+144 


9.85557e+143 


2.24043e+144 


39.06521 


20.23674 


53.60838 


91.8% 



Table 4. Summary of confidence intervals. Dimensions and marginals= s are 
denned same with Table [ij |S| means an estimator of |S| and cv 2 is an estimator 
of cv 2 . The sample size for the SIS procedure is N = 1000 and the sample size 
for bootstraping is B = 5000. Only cases with relatively large cv 2 are involved. 



represents a social relation between 18 monks at some time point. Most of the present data are 
retrospective, collected after the breakup occurred. They concern a period during which a new 
cohort entered the monastery near the end of the study but before the major conflict began. The 
exceptions are "liking" data gathered at three times: SAMPLK1 to SAMPLK3 - that reflect 
changes in group sentiment over time (SAMPLK3 was collected in the same wave as the data 
described below). In the data set four relations are coded, with separate matrices for positive 
and negative ties on the 10 relation: esteem (SAMPES) and disesteem (SAMPDES); liking 
(SAMPLK which are SAMPLK1 to SAMPLK3) and disliking (SAMPDLK); positive influence 
(SAMPIN) and negative influence (SAMPNIN); praise (SAMPPR) and blame (SAMPNPR). In 
the original data set they listed top three choices and recorded as ranks. However, we set these 
ranks as an indicator (i.e., if they are in the top three choices, then we set one and else, zero). 

We ran the SIS procedure with N = 100000 and a bootstrap sample size B = 50000. An 
estimator was 1.704774e+117 with its 95% confidence interval, [1.119321e+117 2.681264e+119] 
and cv 2 = 621 .4 with its 95% confidence interval, [324.29, 2959.65]. The CPU time was 70442 
seconds. The acceptance rate is 3%. 

7. Discussion 

In this paper we do not have a sufficient and necessary condition for the existence of the 
three-way zero-one table so we cannot avoid rejection. However, since the SIS procedure gives 
an unbiased estimator, we may only need a small sample size as long as it converges. For 
example, in Table [TJ all estimators with sample size 1000 are exactly the same as the true 
numbers of tables because they all converge very quickly. Also note that an acceptance rate 
does not depend on a sample size. Thus, it would be interesting to investigate the convergence 
rate of the SIS procedure with CP for zero-one three-way tables. 

It seems that the convergence rate is slower when we have a "large" table (here "large" 
means in terms of |S| rather than its dimension, i.e., the number of cells). A large estimator 

|S| usually corresponds to a larger cv 2 , and this often comes with large variations of |S| and 

cv 2 . This means that if we have a large |S|, more likely we get extremely larger |S| and cv 2 
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and different iterations can give very different results. For example, we ran three iterations for 
the 8x8x8 semimagic cube with all marginals equal to 3 and we got the following results: 
estimator =3.236556e+59 with cv 2 = 7.049114; estimator =2.902294e+59 with cv 2 = 9.047914; 
and estimator =3.880133e+59 with cv 2 = 55.59179. Fortunately, even though we have a large 
|E|, our acceptance rate is still high and a computational time seems to still be attractive. Thus, 
when one finds a large estimation or a large cv 2 , we recommend to apply several iterations and 
pick the result with the smallest cv 2 . We should always compare cv 2 in a large scale. However, 
a small improvement does not necessarily mean a better estimator (see Example 5.10). 

For calculating the bootstrap-t confidence intervals, we often have a larger confidence interval 
when we have a larger cv 2 , and this confidence interval might be less informative and less reliable. 
Therefore we suggest to use the result with the smallest cv 2 for bootstraping procedure. In 
Table |4| we sho wed only confidence intervals for semimagic cubes with m = n = I = 7, . . . , 10 in 
Example |5.17 because of the following reason: When cv 2 is very small, computing bootstrap-t 
confidence interval does not make much sense, since the estimation has already converged. 

For an experiment with Sampson's data set, we have observed a very low acceptance rate 
compared with experimental studies on simulated data sets. We are investigating why this 
happens and how to increase the acceptance rates. 

In [3], the Gale-Ryser Theorem was used to obtain an SIS procedure without rejection for 
two-way zero-one tables. However, for three-way table cases, it seems very difficult because we 
naturally have structural zeros and trivial cases on a process of sampling one table. In [2] Chen 
showed a version of Gale-Ryser Theorem for structural zero for two-way zero-one tables, but it 
assumes that there is at most one structural zero in each row and column. In general there are 
usually more than one in each row and column. 

In this paper the target distribution is the uniform distribution. We are sampling a table 
from the set of all zero-one tables satisfying the given marginals as close as uniformly via the 
SIS procedure with CP. For a goodness-of-fit test one might want to sample a table from the 
set of all zero-one tables satisfying the given marginals with the hypergeometric distribution. 
We are currently working on how to sample a table via the SIS procedure with CP for the 
hypergeometric distribution. 
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Appendix A. Non-parametric bootstrap method 

In this section we explain how to use a non-parametric bootstrap method to get the (1 — 
a) 100% confidence interval for |E|. Notice that the bootstrap sample size is fixed as B, and 
notations here are consistent with Section [2j 

(1) Drawing pseudo dataset. 

Concept In an SIS procedure with sample size N, we get a sequence of random tables 

Xi,...,Xn- Define Yi = , i = 1,...,N where g(X) is the trial distribu- 
tion, then Yi, . . . ,Yn is a sequence of i.i.d random variables. This means that it 
makes sense to consider the empirical distribution of Yi, which is nonpar ametric 
maximum likelihood estimator of the real distribution of Yi (actually, as Yi can 
only take finitely many values, the empirical distribution becomes the maximum 
likelihood estimator of the real distribution) . Draw a pseudo sample Y£ , . . . , Y^j 
from the empirical distribution. 

Algorithm Use the SIS procedure to get Yi = , i = 1,...,N, which should be just a 
sequence of numbers. Draw N elements from this sequence with replacement. 

(2) One Bootstrap replication. 

Concept Consider the pseudo sample YJ, . . . , YJ^ as a "new" sample from the empirical dis- 
tribution, then the cumulative distribution function (CDF) of 9* = T(Y£, . . . , Yjy) 
is a consistent estimator of the CDF of 6 = T(Yi, . . . , Yn). Here we can consider 
our estimator of |E|: 
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T 1 (Y 1 ,...,Y N ) = -^Y i 
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And the cv 2 : 
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Algorithm Treat the pseudo sample as a sample from the SIS and compute the statistics based 
on it. That means, this bootstrap replication can be got by: 

= a7 E Y i ' ™ 2 *! = cv2 °f( Y l • • ■> Y n) 
i=i 

(3) Bootstrap-t Confidence Interval. 

^*l ^*_B 

Concept Repeat the previous two steps until we get B Bootstrap replications: Oi , . . . ,9i , i - 
1, 2. The empirical distribution of #j is the nonparametric maximum likelihood es- 
timator of CDF of 0j , and the latter is consistent estimator of the CDF of B%. So 
we can use (^)IOO^ and (1 — ^)100^ percentiles of the empirical distribution as 
our confidence Interval. 

~ — ~*1 ~^~-*B - — - * 

Algorithm Repeat the previous two steps for B times. For {|E| , . . . , |E| }, define |E|/ \ as 
the 100a t h percentile of the list of values. Then bootstrap-t (1 — a) 100% confidence 
interval of |E| is [|EL a / 2 \, |E1| / 1 _ ct / 2 )]- Similarly we can get confidence interval for 
cv 2 . 



